home *** CD-ROM | disk | FTP | other *** search
-
- Stringer v1.1 © 1996 by Henri Veisterä
- ======================================
- --------------------------------------
-
-
- 0.0 Document Index
- ==================
-
- 1.0 Overview
- 1.1 Requirements
- 1.2 The Story
- 2.0 Usage
- 2.1 Command line options
- 2.2 Output control
- 2.3 Script file usage
- 2.4 Preferences
- 3.0 Example command lines
- 3.1 Example 1 - Vanilla
- 3.2 Example 2 - Multiple strings
- 3.3 Example 3 - A quick scan
- 3.4 Example 4 - Complex patterns
- 3.5 Example 5 - Statistics
- 3.6 Example 6 - Hunt files
- 3.6 Example 7 - Relative frequencies
- 4.0 The test section
- 4.1 Test 1 - Algorithm test
- 4.2 Test 2 - Multiple small file test
- 4.3 Test 3 - Multiple large file test
- 4.4 Test 4 - Multiple string test
- 4.5 Grande Test Total
- 4.6 Buffer effeciency graph
- 5.0 Author and support
- 6.0 Version history
-
-
-
- 1.0 Overview
- ============
-
- 1.1 Requirements
- ----------------
-
- Stringer needs AmigaOS 2.1 (V37) or later and a minimum of 18 kb's of free
- mem to operate. Stringer is pure and can be made resident.
-
-
- 1.2 The Story
- -------------
-
- Yes, yet another Search replacement ... with bells on it. As it happens,
- I wrote this purely for my own usage and amusement and was surprised to see
- that there were so many similar utils already out there. This one started
- its life as a tool for counting strings, mainly very short strings (as in
- counting the relative frequencies of the alphabets) and it just grew from
- there.
-
- What then makes this one oh, so special ? Why should _YOU_ switch
- immediately to Stringer ? Well, it's the fastest one around (see the test
- section). Stringer is a complete in-place replacement for Search emulating
- _every_ function of Search with identical output (if the user so desires).
- Stringer specializes in searching for multiple strings. The strings can be
- of any size, there can be any number of strings, the strings can be read
- from a file and the strings can be substrings of other strings searched
- for. If you frequently search for multiple words for whatever reason, no
- other util comes near in speed and in ease of use.
-
- Stringer also has another use than just as a fancy replacement for Search.
- It displays statistical information on the matched strings. Their match
- counts, relative space usage and relative frequencies.
-
-
-
- 2.0 Usage
- =========
-
- Stringer is run from the CLI with the basic command line looking like this:
-
- Stringer SOURCEFILES STRINGS [OPTIONS]
-
- Stringer behaves exactly like Search with its default options so you can
- immediately use it as you did Search before.
-
-
- 2.1 Command line options
- ------------------------
-
- The full command line usage is as follows:
-
- FROM/M/A - Source files to search from
- SEARCH/A - Strings to search or a file name containing the strings
- Separate strings with "," in CLI or "<CR>" in a file
- B=BUFFER/K/N - Work buffer size in kilobytes, defaults to 100 kb
- C=CASE/S - Case sensitive search
- N=NONUMS/S - No line numbering
- V=VERBOSE/S - Show file size and number and size of strings
- T=SHOWTIME/S - Show elapsed time
- S=STATS/S - Display search statistics
- L=NOLINES/S - Don't show lines were a match was found
- A=ALL/S - Recursively scan the file tree
- M=MATCHES/S - Show number of matches in current file
- Q=QUICK/S - Less CR's in the CLI output
- I=INVERT/S - Show lines that do not match the search criteria
- P=PATTERN/S - SEARCH is an AmigaDOS pattern
- F=FILE/S - Hunt for a filename, SEARCH=filename to search for
- O=ONLY/S - Show only names of files where a match was found
- BIN/S - Force Stringer to search through binary files.
- ANSI/S - Highlight matched string in the output
- MAXHIT/K/N - Give up on a file when MAXHIT matches are found
- QUIET/S - No output to CLI, only a result code
- FILECOL/K/N - Color for filenames (a pen number from 0 to n)
- HITCOL/K/N - Color for a matched string (a pen number from 0 to n)
- STRFILE/S - Load strings from a file, SEARCH=filename
- NOPREFS/S - Ignore ENV:Stringer.prefs
-
- Yeah, I know, all the stuff in Search, Grep and FlashFind crammed into one
- and then some more. See the command line examples for more indepth reviews
- of the options.
-
-
- 2.2 Output control
- ------------------
-
- Specifying the L=NOLINES option means that you won't get any output on your
- screen. So, use it only in combination with the M=MATCHES and/or S=STATS
- options. Using the QUIET option, you won't get any output at all except
- the error messages. Not even the filenames. Using the O=ONLY option
- displays only the file names where a match was found so, in effect, this is
- the same as specifying L=NOLINES, M=MATCHES and Q=QUICK, the output,
- however, looks different in each case.
-
- Use the ANSI option to add some color to your output. You can control the
- colors with the FILECOL and HITCOL options.
-
- Note that if you use the MAXHIT and S=STATS options together the stats
- won't be correct since the search is aborted when MAXHIT hits are found.
- MAXHIT is ignored if the option L=NOLINES is on (as it should be on, when
- you want to see the statistics).
-
- While Stringer is running you can abort the search with CTRL-C or you can
- skip the current file with CTRL-D.
-
-
- 2.3 Script file usage
- ---------------------
-
- Stringer exits with these return codes:
- 0 = One or more match found
- 5 = No matches found
- 20 = A fatal error occured
-
- Use Stringer in a script file like this:
-
- --- AmigaDOS script file ---
- Stringer Text.doc PoroJesse QUIET
-
- If warn
- Echo "Didn't find PoroJesse in Text.doc."
- Else
- Echo "Found the string PoroJesse in Text.doc !"
- EndIf
- --- AmigaDOS script file ---
-
-
- 2.4 Preferences
- ---------------
-
- Stringer reads your preferences from a file called ENV:Stringer.prefs.
- These options are read in first and thus your command line options are
- 'added' to these. To temporarily disable the prefs use the command line
- option NOPREFS.
-
- To set your preferences for Stringer do this in CLI:
-
- Setenv Stringer ANSI MAXHIT=200 QUICK BUFFER=400
-
- To make these prefs permanent do this:
-
- Copy ENV:Stringer.prefs ENVARC:Stringer.prefs
-
-
-
- 3.0 Example command lines
- =========================
-
- 3.1 Example 1
- -------------
-
- Stringer Include/#? ggi_RastPort ALL
-
- Straight forward stuff ... walk through the Include directories searching
- for the string ggi_RastPort. The search is not case sensitive. This will
- produce the same output as using the C: command Search. Only a bit
- faster.
-
-
- 3.2 Example 2
- -------------
-
- Stringer #?new#?.txt "STRING ONE,STRING TWO"
-
- This example shows the quick way to search for multiple strings from
- multiple files. The strings include spaces so they are enclosed in quotes.
- Each string is separated with a comma. If you want to search for a string
- that includes a comma use two commas in a row : "like I said,, then"
-
-
- 3.3 Example 3
- -------------
-
- Stringer dh0:sc/include/#? ALL SEARCH=select B=500 CASE QUICK L M
-
- Here we have a quick scan of all the files in this directory and all its
- child directories. We scan for the word 'select' and the final output will
- only show the names of the files where the string was found and a count of
- matches. The name of the file we currently scan is shown but erased by the
- next name. Like this:
-
- --- Stringer Output ---
- HDisk0:sc/include/intuition/imageclass.h
- 2 matches
- HDisk0:sc/include/intuition/intuition.h
- 22 matches
- HDisk0:sc/include/intuition/preferences.h
- 2 matches
- HDisk0:sc/include/intuition/screens.h
- 1 matches
- HDisk0:sc/include/libraries/asl.h
- 5 matches
- HDisk0:sc/include/libraries/gadtools.h
- 4 matches
- HDisk0:sc/include/libraries/diskfonttag.h
- 2 matches
- HDisk0:sc/include/resources/battmembitsamiga.h
- 2 matches
- --- Stringer Output ---
-
-
- 3.4 Example 4
- -------------
-
- Stringer FROM imageclass.h SEARCH=#?0x0[0-9A-F]L#? PATTERN
-
- If you want to search for a complex AmigaDOS pattern use the option
- PATTERN. Note that there is no sense in using a pattern like
- SEARCH=(#?keput#?|#?sossut#?) since SEARCH=keput,sossut without the option
- PATTERN will give the same result and much faster. In this example we look
- for lines that include an ascii hex number from 0x00L to 0x0FL. Note too
- that the option CASE aplies here also. And a final note: the pattern is
- matched against each line in the source file, meaning that a pattern
- SEARCH=timeout would only match a line that contains nothing but the word
- timeout.
-
- Here's how to make up an AmigaDOS pattern:
-
- --- illegally/without permission quoted from the autodocs ---
-
- The patterns are fairly extensive, and approximate some of the ability
- of Unix/grep "regular expression" patterns. Here are the available
- tokens:
-
- ? Matches a single character.
- # Matches the following expression 0 or more times.
- (ab|cd) Matches any one of the items seperated by '|'.
- ~ Negates the following expression. It matches all strings
- that do not match the expression (aka ~(foo) matches all
- strings that are not exactly "foo").
- [abc] Character class: matches any of the characters in the class.
- a-z Character range (only within character classes).
- % Matches 0 characters always (useful in "(foo|bar|%)").
- * Synonym for "#?", not available by default in 2.0. Available
- as an option that can be turned on.
-
- "Expression" in the above table means either a single character
- (ex: "#?"), or an alternation (ex: "#(ab|cd|ef)"), or a character
- class (ex: "#[a-zA-Z]").
-
- --- illegally/without permission quoted from the autodocs ---
-
-
- 3.5 Example 5
- -------------
-
- Stringer Text.txt Strings.txt STRFILE B 300 C V T S L
-
- This is a typical statistical output command line. You get output looking
- quite different from Search. We go through Text.txt looking for the
- strings in the file Strings.txt. Strings.txt might look like this:
-
- String1
- Stringer2
- Str3
-
- So, it's a text file where the strings are separated with CR's. Internal
- buffer used is 300 kilobytes in size, matching is case sensitive, you'll
- see the file size and the number of strings in the file Strings.txt and
- their collective size. You will see something like this:
-
- --- Stringer Output ---
- Searching 1801657 bytes of data for 3 strings (total 20 bytes) ...
- String Hits All Relat
- ' String1' 95889 53 964
- ' Stringer2' 2557 5 25
- ' Str3' 1001 2 10
- -------------------------------------------------------------------
- TOTAL: 3 strings 99447 60 1000
- Time elapsed: 10 secs (7411549 ticks, 709379 ticks/sec)
- --- Stringer Output ---
-
- This tells you the following:
-
- String1 occurs 95889 (Hits column) times in the text, 96.4 (Relat column)
- percent of the matched strings were String1. String1 makes up 5.3 (All
- column) percent of the whole text.
-
- Stringer2 occurs 2557 times in the text, 2.5 % of the matched strings were
- String2. Stringer2 takes up approx. 9000 bytes (= 0.005 * 1801657) of the
- 1801657 bytes in the whole file.
-
- Str3 occurs 1001 times in the text, 1.0 % of the matched strings were Str3.
-
- The TOTAL line gives the added sums of all the above. The whole operation
- took 10 seconds.
-
-
- 3.6 Example 6
- -------------
-
- Stringer DH0: DH1: DH2: #?.txt ALL FILE
-
- Hunt volumes DH0: , DH1: and DH2: for any files that end in letters .txt .
- This quickly scans the whole directory structure of the three volumes above
- and displays any files found matching the search criteria.
-
-
- 3.7 Example 7
- -------------
-
- Stringer bigtext.txt a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z L S >strout.txt
-
- Go through the file bigtext.txt and count all the alphabets in the text.
- Output statistics to a file called strout.txt.
-
- Sort FROM strout.txt TO sorted.txt COLSTART 45 NUMERIC
-
- Use the AmigaDOS command Sort and you get output like this (file sorted.txt):
-
- --- Stringer Output ---
- String Hits All Relat
- ' Z' 1397 0 1
- ' Q' 1425 0 1
- ' X' 3849 2 3
- ' J' 5831 3 5
- ' V' 9807 6 8
- ' B' 17773 10 16
- ' P' 19669 12 17
- ' W' 19748 12 17
- ' F' 21239 13 19
- ' K' 21733 13 19
- ' G' 22208 13 20
- ' Y' 23191 14 20
- ' M' 32917 20 29
- ' C' 35719 21 32
- ' L' 38968 23 35
- ' U' 39221 24 35
- ' D' 40567 24 36
- ' H' 48059 29 43
- ' R' 64312 39 57
- ' S' 70609 43 63
- ' N' 77408 47 69
- ' I' 80250 49 72
- ' O' 84288 51 76
- ' T' 99484 61 89
- ' A' 102745 63 92
- ' E' 126617 77 114
- -------------------------------------------------------------------
- TOTAL: 26 strings 1109034 669 1000
- --- Stringer Output ---
-
- If the file bigtext.txt was an ascii text file in English you can see from
- the above the relative frequencies of the alphabets in the English language
- (or a good approximation of it). There is a 6.9 percent propability that
- the next letter you see is the letter 'N'. The most common letter is the
- letter 'E'.
-
-
-
- 4.0 The test section
- ====================
-
- I spent a day compiling this stuff, so you better read it through. A quick
- summary can be found at the end of this section. In all the tests I
- experimented with different command line options to get the best results,
- any output was redirected to NIL:. Since some of the utils don't support
- searching multiple files, multiple directories or multiple strings script
- files were created for these utils where they were just executed a multiple
- of times.
-
- Test hardware:
- 68030/28 Mhz, 4 MB/32 bit Mem, Seagate 501 MB SCSI-2 HD, OS3.1
-
- Test software:
- Search 40.1 by AmigaDOS
- ssearch 1.4 by Stefan Sticht
- FSearch 1.2 by Edwin H. Bielawski
- ZSearch 1.0c by Alessandro Zummo, 030 version used
- FlashFind 1.2 by Frank Würkner
- Stringer 1.1 by Henri Veisterä
-
-
- 4.1 Test 1
- ----------
-
- Source file is 1801657 bytes of ascii text with variable line lengths.
- Search one string (strlen = 7 bytes) case sensitive from RamDisk. The
- string is found two times in the source file.
-
- Util Time/s RelSpeed
-
- search 92.9 84.5
- fsearch 15.3 13.9
- ssearch 11.1 10.1
- zsearch 3.7 3.4
- flashfind 1.1 1.0
- stringer 1.1 1.0
-
- This is a test of just the search algorithms, file access times are minimal
- here. FlashFind looses by a few hundreths of a second.
-
-
- 4.2 Test 2
- ----------
-
- Source files are the Commodore assembler include files 218 files - 16 dirs
- - 871692 bytes. Searching for one string 11 bytes in size which is found
- three times in one of the source files. Search is case sensitive from a
- hard drive.
-
- Util Time/s RelSpeed
-
- search 99.2 4.3
- fsearch 57.1 2.5
- zsearch n/a,47.1 2.0
- ssearch 41.9 1.8
- flashfind 35.9 1.6
- stringer 23.0 1.0
-
- Lots of small files (average file size ~4000 bytes) to go through so the
- importance of the search algorithm is less here, a large portion of the
- time is spent loading the files. Stringer wins through efficient multiple
- file implementation. ZSearch can't search multiple files so the time here
- is from a script file created for this test.
-
-
- 4.3 Test 3
- ----------
-
- Commodore AutoDocs 30 files - 1768252 bytes. Searching for one string
- (7 bytes). String is found 47 times in the source files. Search is
- case sensitive from a hard drive.
-
- Util Time/s RelSpeed
-
- search 159.8 24.2
- ssearch 63.4 9.6
- fsearch 63.0 9.5
- flashfind 52.7 8.0
- zsearch n/a,14.4 2.2
- stringer 6.6 1.0
-
- Stringer really shines through here. Easily the fastest 8-P.
-
-
- 4.4 Test 4 - The really unfair test =)
- --------------------------------------
-
- Commodore AutoDocs 30 files - 1768252 bytes. Searching for three strings
- (20 bytes). Strings are found 375 times in the source files. Search is
- not case sensitive and from a hard drive.
-
- search 554.2 38.5
- fsearch n/a,202.6 14.1
- ssearch n/a,195.7 13.6
- flashfind 353.1,159.1 11.0
- zsearch n/a,43.9 3.0
- stringer 14.4 1.0
-
- As you can see, this is why I made Stringer in the first place so it should
- show here ... and it does :-! Only Stringer, FlashFind and Search have
- the capability to search for multiple strings. All the results after the
- commas are from script files created for this test.
-
-
- 4.5 Grande Test Total
- ---------------------
-
- Test1 Test2 Test3 Test4 Total
- Time/s Rel Time/s Rel Time/s Rel Time/s Rel Time/s Rel
-
- search 92.9 84.5 99.2 4.3 159.8 24.2 554.2 38.5 906.1 20.1
- fsearch 15.3 13.9 57.1 2.5 63.0 9.6 202.6+ 14.1 338.0 7.5
- ssearch 11.1 10.1 41.9 2.0 63.4 9.5 195.7+ 13.6 312.1 6.9
- flashfind 1.1 1.0 35.9 1.8 52.7 8.0 159.1+ 11.0 248.8 5.5
- zsearch 3.7 3.4 47.1 1.6 14.4+ 2.2 43.9+ 3.0 109.1 2.4
- stringer 1.1 1.0* 23.0 1.0* 6.6 1.0* 14.4 1.0* 45.1 1.0
-
- A * marks the best result, a + means that the util didn't support this test
- and a script file was used instead.
-
- So, twice as fast as the nearest contender ... but since zsearch is really
- limited in its uses I could say 5 to 20 times as fast as the rest ! Most
- pleasing for me is that Stringer is at its best at just the kind of work I
- use a string searcher for (tests 3 and 4). Curiosly enough, the only util
- whose search algorithm is more powerful than Stringers (FlashFind is faster
- when strlen > 7) fails totally in real world performance. I suspect this
- has to do with faulty a-sync IO routines.
-
-
- 4.6 Buffer effeciency graph
- ---------------------------
-
- No test section would be complete without a beatiful graph. Thus, for your
- enjoyment: the following graph represents the time spent searching for one
- string from a 1759 kb source file with varying buffer sizes.
-
- 1.80 .*
- 1.75 . ·
- 1.70 * .
- 1.65 *
- 1.60 · *·.
- 1.55 * . *·.
- 1.50 . .*. *
- 1.45 ·*··*· .
- 1.40 * . .
- 1.35 .
- 1.30 . .
- 1.25 .
- 1.20 * *··*··*··*··*··*·.
- 1.15 *··*
- 1.10
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-
- On the y-axis is the elapsed time in seconds and on the x-axis is the
- buffer size used in 100 kilobytes. The asterisks '*' are measured data
- points and the points '·' in between were interpolated.
-
- From this you can see that after 1200 kb's the usefullness of a greater
- buffer size is dubious. Note that the low points are situated at 400 kb
- intervals due to exec.librarys internal optimizations. So, in other words,
- if you want to use a greater default buffer size use 400, 800 or 1200 kb's.
-
-
-
- 5.0 Author and support
- ======================
-
- Stringer was written in 100% Assembler by me, Henri Veisterä. For bug
- reports, fan/hate-mail or general chit-chat you can contact me through the
- net via: hveister@snakemail.hut.fi . Stringer is Freeware. Give it away,
- don't sell it. Latest version will always be posted to AmiNet.
-
-
-
- 6.0 Version history
- ===================
-
- Version 1.0 - 26-Jun-93 ... Ancient stats shower
-
- Version 1.1 - 16-Aug-96 ... First public release
-
-
-